Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis

نویسندگان

Kai Yu

Heiga Zen

François Mairesse

Steve J. Young

چکیده

To achieve natural high quality synthesized speech in HMM-based speech synthesis, the effective modelling of complex acoustic and linguistic contexts is critical. Traditional approaches use context-dependent HMMs with decision tree based parameter clustering to model the full combinatorial of contexts. However, weak contexts, such as word-level emphasis in natural speech, are difficult to capture using this approach. Also, due to combinatorial explosion, incorporating new contexts within the traditional framework may easily lead to the problem of insufficient data coverage. To effectively model weak contexts and reduce the data sparsity problem, different types of contexts should be treated independently. Context adaptive training provides a structured framework for this whereby standard HMMs represent normal contexts and transforms represent the additional effects of weak contexts. In contrast to speaker adaptive training in speech recognition, separate decision trees have to be built for different types of context factors. This paper describes the general framework of context adaptive training and investigates three concrete forms: MLLR, CMLLR and CAT based systems. Experiments on a word-level emphasis synthesis task show that all context adaptive training approaches can outperform the standard full-contextdependent HMM approach. However, the MLLR based system achieved the best performance. ! 2011 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker and language adaptive training for HMM-based polyglot speech synthesis

This paper proposes a novel technique for speaker and language adaptive training for HMM-based statistical parametric polyglot speech synthesis. Language-specific context-dependencies in the system are captured using CAT with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by CMLLR-based transforms. This framework allows multi-speaker/multi-la...

متن کامل

Context adaptive training with factorized decision trees for HMM-based speech synthesis

To achieve natural high quality synthesised speech in HMMbased speech synthesis, the effective modelling of complex acoustic and linguistic contexts is critical. Traditional approaches use context-dependent HMMs with decision tree based parameter clustering to model the full combination of contexts. However, weak contexts, such as word-level emphasis in neutral speech, are difficult to capture ...

متن کامل

Dialogue context sensitive speech synthesis using factorized decision trees

This paper extends our recent work on rich context utilization for expressive speech synthesis in spoken dialogue systems in which significant improvements to the appropriateness of HMM-based synthetic voices were achieved by introducing dialogue context into the decision tree state clustering stage. Continuing in this direction, this paper investigates the performance of dialogue context-sensi...

متن کامل

Deep neural network-based statistical parametric speech synthesis system using improved time-frequency trajectory excitation model

This paper proposes a deep neural network (DNN)-based statistical parametric speech synthesis system using an improved time-frequency trajectory excitation (ITFTE) model. The ITFTE model, which efficiently reduces the parametric redundancy of a TFTE model, improved the perceptual quality of the vocoding process and the estimation accuracy of the training process. However, there remain problems ...

متن کامل

Overview of NIT HMM - based speech synthesis system for Blizzard Challenge 2011

This paper describes a hidden Markov model (HMM) based speech synthesis system developed for the Blizzard Challenge 2011. In the Blizzard Challenge 2011, we focused on the training algorithm for HMM-based speech synthesis systems. To alleviate the local maxima problems in the maximum likelihood estimation, we apply the deterministic annealing expectation maximization (DAEM) algorithm for traini...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Speech Communication

دوره 53 شماره

صفحات -

تاریخ انتشار 2011

Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis

نویسندگان

چکیده

منابع مشابه

Speaker and language adaptive training for HMM-based polyglot speech synthesis

Context adaptive training with factorized decision trees for HMM-based speech synthesis

Dialogue context sensitive speech synthesis using factorized decision trees

Deep neural network-based statistical parametric speech synthesis system using improved time-frequency trajectory excitation model

Overview of NIT HMM - based speech synthesis system for Blizzard Challenge 2011

عنوان ژورنال:

اشتراک گذاری